Add `dtype-i128` feature flag and `Int128Type` #6374

plaflamme · 2023-01-22T16:07:06Z

This adds a new datatype: Int128Type which is the primitive type used by arrow to represent Decimal types. This new type is behind a dtype-i128 feature flag.

Introducing this type requires abstracting over the arithmetic used for PrimitiveArray because decimal arithmetic must keep track of changes to precision / scale. This PR adds a new trait ArrayArithmetics which delegates to basic for all primitive types except i128 and decimal for i128. There's probably a better approach, but this is the extent of what my Rust-foo is capable of delivering.

There are a few problems:

div_scalar in basic and decimal do not equivalent signatures: unclear how we can abstract over this
rem and rem_scalar are not implemented in decimal (this doesn't seem like a blocker)

This is obviously just a draft to get some feedback.

Decimal arithmetics require manipulating the DataType when doing some operations, i.e.: changing precision/scale

ritchie46

Thanks for the PR. I think we should not think of decimal yet and just implement i128 as its own type.

The future decimal type will be a wrapper around this and that type will have to figure out how to deal with arithmetic, it is not a concern of the physical i128 type. So the arithmetic can continue as is.

ritchie46 · 2023-01-22T16:28:56Z

polars/polars-core/src/datatypes/mod.rs

@@ -82,6 +82,8 @@ impl_polars_datatype!(Int8Type, Int8, i8);
 impl_polars_datatype!(Int16Type, Int16, i16);
 impl_polars_datatype!(Int32Type, Int32, i32);
 impl_polars_datatype!(Int64Type, Int64, i64);
+#[cfg(feature = "dtype-i128")]
+impl_polars_datatype!(Int128Type, Unknown, i128);


Unknown should be Int128Type

Presumably, you mean Int128 (which would be DataType::Int128)?

If so, I don't think that will work because I think DataType will eventually be something like DataType::Decimal(precision, scale), right?

Yes, but then I think we should make the DataType something like DataType::Decimal(Option<precision, scale>). Then we can fill in the blanks later.

plaflamme · 2023-01-22T19:23:57Z

@ritchie46 Thanks for the comments. The issue is that NumericNative has a trait bound of NativeArithmetics here. But that's not implemented for i128 precisely because the arithmetics for i128 are not the same as the native i128 Rust type (because it also needs to track precision and scale changes). This is mentioned here.

So when we introduce i128 as a polars "primitive" type, we can no longer respect this trait bound and must already abstract over it.

Unless what you're suggesting is that we don't use i128 and use a wrapper type so we can implement NativeArithmetics for it?

plaflamme · 2023-01-26T20:03:35Z

@ritchie46 Any additional feedback or guidance you could provide? I might have some more time over the next few days to look at this.

ritchie46

Could you make clippy happy, then we can go forward.

ritchie46 · 2023-01-31T07:23:35Z

polars/polars-core/src/datatypes/mod.rs

@@ -82,6 +82,8 @@ impl_polars_datatype!(Int8Type, Int8, i8);
 impl_polars_datatype!(Int16Type, Int16, i16);
 impl_polars_datatype!(Int32Type, Int32, i32);
 impl_polars_datatype!(Int64Type, Int64, i64);
+#[cfg(feature = "dtype-i128")]
+impl_polars_datatype!(Int128Type, Unknown, i128);


Yes, but then I think we should make the DataType something like DataType::Decimal(Option<precision, scale>). Then we can fill in the blanks later.

plaflamme · 2023-01-31T14:52:13Z

Alrighty, I've fixed the warning in b803b6d and added DataType::Decimal128 in 0d0a898 PTAL

ritchie46 · 2023-02-01T09:48:22Z

Thanks @plaflamme. The next step is adding this Series. Would you want to have this merged first and then continue on the Series later?

plaflamme · 2023-02-01T14:42:23Z

If we can get this merged, that would be preferable, yeah. Though I'd be worried with the todo!()s introduced since those will cause panics which isn't particularly good UX!

ritchie46 · 2023-02-02T07:11:53Z

If we can get this merged, that would be preferable, yeah. Though I'd be worried with the todo!()s introduced since those will cause panics which isn't particularly good UX!

Nope, it is an incremental effort.

ritchie46 · 2023-02-02T07:13:16Z

Thanks @plaflamme. Next up a Series of Decimal 🚀 ;)

plaflamme · 2023-02-03T05:23:35Z

@ritchie46 great, thanks for merging this. Do you have any pointers to get me started on Series? I took a very brief look and it seems like there are a few things that will need attention. Would you suggest starting with the implementations module?

ritchie46 · 2023-02-03T21:04:28Z

Yes.. And you need to make newtype that is ChunkedDecimal wrapped around ChunkedArray<128>

plaflamme · 2023-02-03T23:31:12Z

@ritchie46 am I on the right track with this?

ritchie46 · 2023-02-04T05:07:44Z

Yes! Most methods can be simply dispatched to ChunkedArray<128> from the Series implementation.

If you open a PR we could track this.

plaflamme · 2023-02-04T20:46:54Z

@ritchie46 here's a PR: #6374

aldanor · 2023-02-22T22:55:35Z

@plaflamme Thanks for your work on this! I'm very exicted in seeing the int128 support becoming full-fledged (and yet another thing to distance polars from pandas...).

@ritchie46 wonder what's the approximate roadmap to getting int128/decimals supported all the way through to the Python layer, what are the next steps from here? (I could try to contribute myself if the tasks are reasonably narrow-scoped and if it helps to speed it up)

plaflamme · 2023-02-22T23:32:25Z

@aldanor I've started some work in this branch. I don't really know what I'm doing, I'm mostly following breadcrumbs from datetime and other types that are similar to Decimal. In any case, help on this would be appreciated, feel free to make PRs against my branch.

@ritchie46 any guidance would be appreciated

aldanor · 2023-02-23T20:00:55Z

@plaflamme Maybe open a PR in your own repo and tag me? Skimming through, I might have a few random suggestions - would be easier to comment in the PR

aldanor · 2023-02-23T22:29:12Z

Actually, one (perhaps weird) question to discuss before we hack too deep: could we just get rid of precision altogether? This would potentially simplify quite a lot of the code.

It won't change any computational logic or affect any casting as it's used solely for validation. Can't we just always use max-precision, i.e. 38, internally?

Off top of my head, the only thing it can visibly affect is formatting, i.e. 123.45 with (5, 2) will get printed as "123.45" whereas 123.45 with (7, 3) will get printed as 0123.450, IIRC. Well... that, plus validation errors that may pop up during runtime where your value doesn't fit declared precision – do we want that?

Another related question is whether scale (and precision if it's used) should be an Option<>? If we need an Int128Type, shouldn't it be a separate thing/datatype? Also, there's no such datatype as "int128" in arrow, only the Decimal128 itself.

plaflamme added 3 commits January 22, 2023 10:51

feat: add new dtype-i128 feature flag

296cf14

feat: add DataType::Int128

6c874de

feat: define ArrayArithmetics to abstract over basic vs decimal

e910c22

Decimal arithmetics require manipulating the DataType when doing some operations, i.e.: changing precision/scale

plaflamme mentioned this pull request Jan 22, 2023

Support for Decimal series? #4104

Closed

ritchie46 reviewed Jan 22, 2023

View reviewed changes

ritchie46 reviewed Jan 31, 2023

View reviewed changes

plaflamme added 3 commits January 31, 2023 09:15

Merge remote-tracking branch 'upstream/master' into dtype-i128

90681fd

tidy: fix clippy warning

b803b6d

feat: add DataType::Decimal128

0d0a898

plaflamme force-pushed the dtype-i128 branch from 5bdee77 to 0d0a898 Compare January 31, 2023 14:50

ritchie46 merged commit 8ff0dcf into pola-rs:master Feb 2, 2023

plaflamme deleted the dtype-i128 branch February 3, 2023 05:23

sslivkoff mentioned this pull request Apr 12, 2023

pl.read_parquet() and pl.write_parquet() for pl.Decimal #8191

Closed

This was referenced Apr 14, 2023

Add support for FixedSizeList from Arrow instead of crashing on from_arrow #8023

Closed

Add dtype-fixed-size-list feature flag and FixedSizeListType #8245

Closed

alexander-beedie added the A-dtype-decimal Area: decimal data type label Jan 16, 2024

wence- mentioned this pull request May 14, 2024

stack overflow in physical executor with very deep expression trees #16225

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `dtype-i128` feature flag and `Int128Type` #6374

Add `dtype-i128` feature flag and `Int128Type` #6374

plaflamme commented Jan 22, 2023

ritchie46 left a comment

ritchie46 Jan 22, 2023

plaflamme Jan 22, 2023

ritchie46 Jan 31, 2023 •

edited

Loading

plaflamme commented Jan 22, 2023

plaflamme commented Jan 26, 2023

ritchie46 left a comment

ritchie46 Jan 31, 2023 •

edited

Loading

plaflamme commented Jan 31, 2023

ritchie46 commented Feb 1, 2023

plaflamme commented Feb 1, 2023 •

edited

Loading

ritchie46 commented Feb 2, 2023

ritchie46 commented Feb 2, 2023

plaflamme commented Feb 3, 2023

ritchie46 commented Feb 3, 2023

plaflamme commented Feb 3, 2023

ritchie46 commented Feb 4, 2023

plaflamme commented Feb 4, 2023

aldanor commented Feb 22, 2023 •

edited

Loading

plaflamme commented Feb 22, 2023

aldanor commented Feb 23, 2023

aldanor commented Feb 23, 2023 •

edited

Loading

Add dtype-i128 feature flag and Int128Type #6374

Add dtype-i128 feature flag and Int128Type #6374

Conversation

plaflamme commented Jan 22, 2023

ritchie46 left a comment

Choose a reason for hiding this comment

ritchie46 Jan 22, 2023

Choose a reason for hiding this comment

plaflamme Jan 22, 2023

Choose a reason for hiding this comment

ritchie46 Jan 31, 2023 • edited Loading

Choose a reason for hiding this comment

plaflamme commented Jan 22, 2023

plaflamme commented Jan 26, 2023

ritchie46 left a comment

Choose a reason for hiding this comment

ritchie46 Jan 31, 2023 • edited Loading

Choose a reason for hiding this comment

plaflamme commented Jan 31, 2023

ritchie46 commented Feb 1, 2023

plaflamme commented Feb 1, 2023 • edited Loading

ritchie46 commented Feb 2, 2023

ritchie46 commented Feb 2, 2023

plaflamme commented Feb 3, 2023

ritchie46 commented Feb 3, 2023

plaflamme commented Feb 3, 2023

ritchie46 commented Feb 4, 2023

plaflamme commented Feb 4, 2023

aldanor commented Feb 22, 2023 • edited Loading

plaflamme commented Feb 22, 2023

aldanor commented Feb 23, 2023

aldanor commented Feb 23, 2023 • edited Loading

Add `dtype-i128` feature flag and `Int128Type` #6374

Add `dtype-i128` feature flag and `Int128Type` #6374

ritchie46 Jan 31, 2023 •

edited

Loading

ritchie46 Jan 31, 2023 •

edited

Loading

plaflamme commented Feb 1, 2023 •

edited

Loading

aldanor commented Feb 22, 2023 •

edited

Loading

aldanor commented Feb 23, 2023 •

edited

Loading